Skip to content

Add adaptive embedding throughput shaping for Azure 429 limits#1115

Merged
BenjaminMichaelis merged 5 commits into
mainfrom
benjaminmichaelis/embedding-throughput-shaping
May 16, 2026
Merged

Add adaptive embedding throughput shaping for Azure 429 limits#1115
BenjaminMichaelis merged 5 commits into
mainfrom
benjaminmichaelis/embedding-throughput-shaping

Conversation

@BenjaminMichaelis
Copy link
Copy Markdown
Member

Why

The previous retry-only fix still failed under sustained S0 throttling: large embedding requests kept exhausting retries at the same payload size. We need throughput shaping so rebuilds can continue progressing under rate limits instead of stalling at repeated 429 exhaustion.

What changed

  • Added adaptive batch downshifting in embedding rebuilds:
    • starts at configured max batch size
    • on 429/RateLimitReached, splits throttled batches and retries smaller sub-batches
    • reuses the smaller successful size for subsequent requests in the same run
    • fails clearly if batch size 1 still exhausts retries
  • Added explicit request pacing controls:
    • AIOptions:EmbeddingRetry:MaxEmbeddingBatchSize (default 2048)
    • AIOptions:EmbeddingRetry:MinInterRequestDelayMs (default 250)
    • embedding requests are serialized and paced between calls to reduce sustained RPM pressure
  • Hardened Retry-After parsing:
    • supports retry-after, retry-after-ms, x-ms-retry-after-ms
    • supports extracting retry after N seconds from exception message text
  • Added coarse progress logging during rebuilds (not per call):
    • logs start configuration
    • logs progress at 10% milestones when total count is known
    • falls back to every 500 chunks when total count is unknown
    • includes current adaptive batch size in progress messages

Validation

  • dotnet build EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csproj -c Release --nologo
  • dotnet test EssentialCSharp.Chat.Tests/EssentialCSharp.Chat.Tests.csproj -c Release --no-restore -v q

Both passed.

- Downshift embedding batch size on repeated 429s by recursively splitting batches
- Reuse successful smaller batch size for subsequent requests in the same run
- Fail clearly when batch size 1 still receives sustained 429 throttling
- Add sequential request pacing with configurable min inter-request delay
- Add configurable MaxEmbeddingBatchSize and MinInterRequestDelayMs options
- Harden Retry-After parsing for retry-after, retry-after-ms, x-ms-retry-after-ms, and message hints
- Update configuration comments and default appsettings values
- Log embedding rebuild start with known total (when available)
- Emit progress at 10% milestones when total chunk count is known
- Fall back to every 500 chunks when total is unknown
- Include current adaptive batch size in progress logs
Copilot AI review requested due to automatic review settings May 16, 2026 05:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves resilience of Azure OpenAI embedding rebuilds under sustained throttling by introducing adaptive batch downshifting, request pacing, and more robust Retry-After handling so rebuilds can continue progressing instead of repeatedly exhausting retries.

Changes:

  • Added adaptive batch splitting/downshifting on 429/RateLimitReached during embedding rebuild uploads.
  • Serialized and paced embedding requests with a configurable minimum inter-request delay.
  • Hardened Retry-After parsing (more header variants + message parsing) and added coarse rebuild progress logging.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
EssentialCSharp.Web/appsettings.json Adds default configuration values for max embedding batch size and inter-request pacing delay.
EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Implements pacing/serialization, adaptive batch downshifting on throttling, improved Retry-After extraction, and rebuild progress logging.
EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs Introduces new retry/pacing configuration knobs with validation.
EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs Clarifies configuration override semantics for the embedding retry options binding.

Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs
Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Outdated
- Make embedding pacing timestamp static to match static request lock scope
- Use long arithmetic in percent progress threshold comparison to avoid overflow
Comment thread EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs Fixed
- Make _lastEmbeddingRequestStartedUtc instance-scoped
- Keep pacing behavior unchanged for singleton DI registration
- Log request attempt state before each embedding call with batch sizing fields
- Log successful batch requests using the same structured state event
- Log throttled downshift transitions with old/new effective batch size context
- Add end-of-run successful batch-size summary counts for production tuning
@BenjaminMichaelis BenjaminMichaelis merged commit 920c021 into main May 16, 2026
8 checks passed
@BenjaminMichaelis BenjaminMichaelis deleted the benjaminmichaelis/embedding-throughput-shaping branch May 16, 2026 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants